Comparison of GPU architectures for asynchronous communication with finite-differencing applications

نویسندگان

Daniel P. Playne

Kenneth A. Hawick

چکیده

Graphical Processing Units (GPUs) are good data-parallel performance accelerators for solving regular mesh partial differential equations (PDEs) whereby low-latency communications and high compute to communications ratios can yield very high levels of computational efficiency. Finite-difference time-domain methods still play an important role for many PDE applications. Iterative multi-grid and multilevel algorithms can converge faster than ordinary finite difference methods but can be much more difficult to parallelise with GPU memory constraints. We report on some practical algorithmic and data layout approaches and on performance data on a range of GPUs with CUDA. We focus on the use of multiple GPU devices with a single CPU host and the asynchronous CPU/GPU communications issues involved. We obtain more than two orders of magnitude of speedup over a comparable CPU core.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asynchronous Communication for Finite-Difference Simulations on GPU Clusters using CUDA and MPI

Graphical processing Units (GPUs) are finding widespread use as accelerators in computer clusters. It is not yet trivial to program applications that use multiple GPU-enabled cluster nodes efficiently. A key aspect of this is managing effective communication between GPU memory on separate devices on separate nodes. We develop a algorithmic framework for Finite-Difference numerical simulations t...

متن کامل

Voltage Differencing Buffered Amplifier based Voltage Mode Four Quadrant Analog Multiplier and its Applications

In this paper a voltage mode four quadrant analog multiplier (FQAM) using voltage differencing buffered amplifier (VDBA) based on quarter square algebraic identity is presented. In the proposed FQAM the passive resistor can be implemented using MOSFETs operating in saturationregion thereby making it suitable for integration. The effect of non idealities of VDBA has also been analyzed in this pa...

متن کامل

A Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications

In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the develop...

متن کامل

Hybrid CPU-GPU Pipeline Framework PDPTA’14

The pipeline pattern for parallel programs is utilized in a wide array of scientific applications designed for execution on hybrid CPU-GPU architectures. However, there is a dearth of tools and libraries to support implementation of pipeline parallelism for hybrid architectures. We present the Hybrid Pipeline Framework (HyPi) that is intended to fill this gap. HyPi provides high level abstracti...

متن کامل

Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

Numerical methods for elliptic partial differential equations (PDEs) within both continuous (CG) and hybridized discontinuous Galerkin (HDG) frameworks share the same general structure: local (elemental) matrix generation followed by a global linear system assembly and solve. The lack of inter-element communication and easily parallelizable nature of the local matrix generation stage coupled wi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Concurrency and Computation: Practice and Experience

دوره 24 شماره

صفحات -

تاریخ انتشار 2012

Comparison of GPU architectures for asynchronous communication with finite-differencing applications

نویسندگان

چکیده

منابع مشابه

Asynchronous Communication for Finite-Difference Simulations on GPU Clusters using CUDA and MPI

Voltage Differencing Buffered Amplifier based Voltage Mode Four Quadrant Analog Multiplier and its Applications

A Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications

Hybrid CPU-GPU Pipeline Framework PDPTA’14

Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

عنوان ژورنال:

اشتراک گذاری